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Abstract 

We present a simple and general result that the sign of the variations or increments of uncorrelated 
times series are predictable with a remarkably high success probability of 75% for symmetric sign dis- 
tributions. The origin of this paradoxical result is explained in details. We also present some tests on 
synthetic, financial and global temperature time series. 

1 Introduction 

Predicting the future evolution of a system from the analysis of past time series is the quest of many 
disciplines, with a wide range of useful potential applications including natural hazards (volcanic eruptions, 
. earthquakes, floods, hurricanes, global warming, etc.), medecine (epilectic seizure, cardiac arrest, parturition, 
[ etc.) and stock markets (economic recessions, financial crashes, investments, etc.). The absolute fundamental 

■ prerequisite is that the (possibly spatio-temporal) time series xi,X2,--- possess some dependence of the 
. future on the past. If absent, the best prediction of the future is captured by the mathematical concept of 

a martingale: the expectation E(xf_|_i|past) of the future conditioned on the past is the last realisation Xf- 
In many applications, one is interested in the variation xt+i — xt of the time series. 

The result we present below is, in one sense, obvious and, in another, quite counter-intuitive. Starting 
, from a completely uncorrelated time series, we know by definition that future values cannot be better 
predicted than by random coin toss. However, we show that the sign of the increments of future values can 
be predicted with a remarkably high success rate of up to 75% for symmetric time series. The derivation 
is straightforward but the counter-intuitive result warrants, we believe, its exposition. This little exercice 
illustrates how tricky can be the assessment of predictive power and statistical testing. 

2 First derivation 

Consider a time series x{t) sampled at discrete times ti,t2, ... which can be equidistant or not. We denote 
xi,X2, ... the corresponding measurements. We assume that the measurements xi,X2, ■■■ are i.i.d. (indepen- 



dent identically distributed). Consider first the simple case where xi,X2, ■■■ are uniformly and independently 
drawn in the interval [0, 1] and the average value or expectation is E(x) = 1/2. 



2.1 Prediction scheme 

We ask the following question: based on previous values up to Xj, what is the best predictor for the increment 
Xj+i — Xj? A naive answer would be that, since the x's are independent and uncorrelated, their increments 
are also independent and the best predictor for the increment Xj+i — Xi is zero (martingale choice) . This 
turns out to be wrong. If indeed the expectation of the increment is given by 

E{xi+i - Xi) = E{xi+i) - E{x^) = 1/2 -1/2 = 0, (1) 

the conditional expectation E(xi+i — Xi\xi), conditionned on the last realization Xj, is given by 

E{xi+i - Xi\xi) = E(xj+i|xi) - E{xi\xi) = Xi , (2) 

where the term 1/2 uses the independence between Xj4_i and Xi (E(a;j4_i|xj) = E(xj+i) = 1/2) and the last 
term in the r.h.s. uses the identity E(xi|xj) = Xj. We thus see that the sign of the increment has some 
predictability: 

• if Xj > 1/2, the expectation is that Xj+i will be smaller than Xj; 

• if Xj < 1/2, the expectation is that Xj+i will be larger than Xj. 

This predictability can be seen from the fact that the increments of x{t) are anti-correlated: 

E ((xj+i - Xi) {xi - Xi-i)) = E(xj+iXi) - E(xj+iXi_i) - E(xj^) + E(xjXj_i) = ;^ ~ ;^ ~ ^ + ^ = ■ (^) 

This anti-correlation leads indeed to the predictability mentionned above, namely that the best predictor 
for Xi_|_i — Xi is that Xj+i — Xj be of the sign opposite to x,; — 1/2. 

Another way to understand where the predictability of the increm,ents of incorrelated variables comes 
from is to realize that increments are discrete realizations of the differentiation operator. Under its action, 
a fiat (white noise) spectrum becomes colored towards the "blue" (which is the opposite of the well-known 
action of integration which "reddens" white noise) and there is thus a short-range correlation appearing in 
the increments. 



2.2 Probability for a successful prediction 

A natural question is to determine the success rate p+ of this strategy, i.e. the probability that the sign of 
the increment Xj+i — Xi be as predicted equal to the sign of 1/2 — Xj. To address this question, we study 
the following quantity 

e = E ^sign (xj+i - Xi) sign Q ~ ^i)) ' (4) 

where the product of signs inside the expectation operator is +1 if the prediction is born out by the data 
and —1 in the other case. The relationship between e and p+ is 

e=(+l)p+ + (-l)(l-p+) = 2p+-l ^ p+ = l + |. (5) 
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Expression (|5|) shows that e quantifies the deviation for the random coin toss result p+ = 50%. From the 
definition (13), we have 



e= dx, (+1) ^ ' dxi+i (-1) +^ d^i+i (+1)) + f^a^i (-1) (^^ " dx^+i (-1) + / dxi+i (+1) ) , 



(6) 

which gives 

e = 1/2 and thus p+ = 75% . (7) 

Figure |l| shows a numerical simulation which evaluates p+ as a function of cumulative number of realisations 
with the strategy that Xj+i — xi is predicted of the opposite sign to Xi — 1/2, using a pseudo-random number 
generator with values uniformely distributed between and 1. 



3 General derivation for arbitrary distributions 

This result is actually quite general. Consider an arbitrary random variable Xi with arbitrary probability 
density distribution P{x) with average {x). We form the centered variable 



6i — Xi 



(8) 



with zero mean {5) = and pdf Similarly to (|2|), we study the conditional expectation of its increments 

(5i+i — (5i, given the last realization 5i: 



E(,5,+i - 5i\5i) = E(5i+i|5i) - E(Ji|5,) 



(9) 



where we have used the fact that the Jj's are uncorrelated. Thus, the best predictor of the sign of the 
increment of the 5's is the opposite of the sign of the last realization. We then quantify the probability of 
prediction success through the quantity defined similarly to (^) as 



e = -E (sign (Jj+i - 5i) sign {5i)) , 
which is related to the success probability p+ by (^) . It is easily calculated as 



(10) 



oo 



d5,+i P{5i+i) (-1) + / d5i+i P{5,+i) (+1) 



d5i+i P{S,+i) (-1) + / d5i+i P{5i+i) 

Si 



dS, P{6i) (+1) 

+ / d6^ P{5,) (-1 
/o 

It is convenient to introduce the cumulative distribution 

F{x) = r d6 P{S) 



and the probabilities F_ = F{0) (resp. F+ = 1 — F{0)) that 6 be less (resp. larger) than 0. Expression 
transforms into 

6 = F_ - F+ - {F{0)f + (F(+oo))2 - {F{0)f , (13) 

where we have used the identity 2 dx P{x) F{x) = [F{y)]'^. Using the definition of -F_ and -F+ and the 
normalization F(-\-oo) = 1 leads to 



(11) 



(12) 



e = 2F+(l-F+) and p+ = - + F+{1 - F+) 



(14) 
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For symmetric distributions and for those distributions such that = 1/2, we retrieve the previous result 
(^). This result is thus seen to be very general and independent of the shape of the distribution of the i.i.d. 
variables as long as i*+ = 1/2 (attained in particular but not exclusively for symmetric distributions). Note 
that the value p+ = 75% is the largest possible result attained for F+ = 1/2. For ^ 1/2, 0.5 < p+ < 0.75. 

Figure Q shows the estimation of j»+ used on the thirty year US treasury bond TYX from Oct. 29, 1993 
till Aug. 9, 1999. Specifically, we start from the daily close quotes q{t) and construct the price variations 
Sq{t) = q{t) — q{t — 1). We try to predict the variation of 5q{t) with the strategy that 6q{t + 1) — Sq{t) 
is predicted of the opposite sign to 5q{t) — (dq). The corresponding success probability p+ is plotted as 
a function of time by cumulating the realizations to estimate p^. As expected, at the beginning, large 
fluctuations express the limited statistics. As the statistics improves, p+ converges to the predicted value 
75%. We note that, in comparison to the pseudo-random number series shown in figure |l], the convergence 
seems to occur at a similar rate, suggesting that there are no appreciable global short-range correlations, in 
agreement with many previous statistical tests [||, ^, ^ . 



4 Discussion 



This paradoxical result tells us that one can get on average a success rate of three out of four in guessing 
what is the sign of the increment of uncorrelated random variables. This is quite surprising a priori but, as 
we explained above, stems from the action of the differential operator which makes the spectrum "blueish" , 
thus introducing short-range correlations. 

This predictive skill does not lead to any anomalies. Consider for instance the time series of price 
returns of a stock market. According to the efficient market hypothesis (ref.[0 and references therein) 
and the random walk model, successive (say daily) price returns of liquid organized markets are essentially 
independent with approximately symmetric distributions. Our result (|l^) shows then that we can predict 
with a 75% accuracy the sign of the increment of the daily returns (and not the sign of the returns that are 
proportional to the increment of the prices themselves). This predictive skill is not associate to an arbitrage 
opportunities in market trading. This can be seen as follows. For simplicity of language, we consider price 
returns (5's relative to their average so that we deal with uncorrelated variables with zero mean as defined in 
(P). In addition, we restrict our discussion to the optimal case where -F+ = 1/2. Consider first the situation 
where Si is positive and quite large (say two standard deviations above zero). We expect that any typical 
realization, and in particular the next one to be positive or negative but close to zero to within say one 
standard deviation. This implies that we expect with a large probability 6i^i to be smaller than 6i. This 
is the guess that is compatible and in fact constructs the result (14). Consider now the second situation 
where Si is positive but very small and close to zero. We then have by construction of the process that 
Si^i will be larger or smaller than Si with probability close to 1/2. In this case, we loose any predictive 
skill. What the result (|l^ quantifies mathematically is that all these types of realizations averages out to a 
global probability of 75% for the sign of the increment to be predicted by the sign of —Si. This large value 
is not giving us any "martingale" (in the common sense of the word). Actually, it states simply that, for 
independent realizations, large values have to be followed by smaller ones. This analysis relies fundamentally 
on the independence between successive occurrence of the variables Si. Predicting with 75% probability the 
sign of — Si does not improve in any way our success rate for prediction the sign of dj+i (which would 
be the real arbitrage opportunity). 

Deviations from p^ = 75%, and in particular results larger than 75% which is a maximum in the 
uncorrelated case (see ([l^, signal the presence of correlations. An instance is shown in figure ^ which plots 
p+ for the prediction of the variations of the isotopic deuterium time series from the Vostok (south Pole) ice 
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core sample, which is a proxy for the local temperature from about 220 ky in the past to present. The data 
is taken from Q. We observe that remains above 75% showing a significant genuine anti-correlation. 

Acknowledgements: We thank P. Yiou for providing the temperature time series and S. Gluzman for 
stimulating discussions. 
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Figure 1: Numerical simulation which evaluates as a function of cumulative number of realisations with 
the strategy that Xj+i — Xi is predicted of the opposite sign to Xi — 1/2, using a pseudo-random number 
generator with values uniformely distributed between and 1. 
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Figure 2: Predictability of the increments of the price variations, i.e. the acceleration, of the thirty year 
US treasury bond TYX from Oct. 29, 1993 till Aug. 9, 1999. Specifically, we start from the daily close 
quotes q{t) and construct the price variations 5q{t) = q{t) — q{t — 1). Wc try to predict the variation of 5q{t) 
with the strategy that dq{t + 1) — 6q{t) is predicted of the opposite sign to 5q{t) — {q). The corresponding 
success probability is plotted as a function of time by cumulating the realizations to estimate p+. 
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Figure 3: Success probability p+ for the prediction of the variations of the isotopic deuterium time series 
from the Vostok (south Pole) ice core sample, which is a proxy for the local temperature from about 220 
thousand years in the past to present. The data is taken from Q. 
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